LSH At Large - Distributed KNN Search in High Dimensions
نویسندگان
چکیده
We consider K-Nearest Neighbor search for high dimensional data in large-scale structured Peer-to-Peer networks. We present an efficient mapping scheme based on p-stable Locality Sensitive Hashing to assign hash buckets to peers in a Chord-style overlay network. To minimize network traffic, we process queries in an incremental top-K fashion leveraging on a locality preserving mapping to the peer space. Furthermore, we consider load balancing by harnessing estimates of the resulting data mapping, which follows a normal distribution. We report on a comprehensive performance evaluation using high dimensional real-world data, demonstrating the suitability of our approach.
منابع مشابه
Optimisation of correlation matrix memory prognostic and diagnostic systems
Condition monitoring systems for prognostics and diagnostics can enable large and complex systems to be operated more safely, at a lower cost and have a longer lifetime than is possible without them. AURA Alert is a condition monitoring system that uses a fast approximate k Nearest Neighbour (kNN) search of a timeseries database containing known system states to identify anomalous system behavi...
متن کاملRankReduce - Processing K-Nearest Neighbor Queries on Top of MapReduce
We consider the problem of processing K-Nearest Neighbor (KNN) queries over large datasets where the index is jointly maintained by a set of machines in a computing cluster. The proposed RankReduce approach uses locality sensitive hashing (LSH) together with a MapReduce implementation, which by design is a perfect match as the hashing principle of LSH can be smoothly integrated in the mapping p...
متن کاملNearBucket-LSH: Efficient Similarity Search in P2P Networks
We present NearBucket-LSH, an effective algorithm for similarity search in large-scale distributed online social networks organized as peer-to-peer overlays. As communication is a dominant consideration in distributed systems, we focus on minimizing the network cost while guaranteeing good search quality. Our algorithm is based on Locality Sensitive Hashing (LSH), which limits the search to col...
متن کاملlsh, Nearest neighbor search in high dimensions
Calculating distance pairs is O(n2) in memory and time and finding the nearest neighbor is O(n) in time. Tree indexing techniques like kd-tree [2] were developed to cope with large n, however their performance quickly breaks down for p > 3 [3]. Locality sensitive hashing (LSH) [3] is a technique for generating hash numbers from high dimensional data, such that nearby points have identical hashe...
متن کاملScalable Locality-Sensitive Hashing for Similarity Search in High-Dimensional, Large-Scale Multimedia Datasets
Similarity search is critical for many database applications, including the increasingly popular online services for Content-Based Multimedia Retrieval (CBMR). These services, which include image search engines, must handle an overwhelming volume of data, while keeping low response times. Thus, scalability is imperative for similarity search in Webscale applications, but most existing methods a...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2008